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Abstract 

The goal of this paper is to review the main trends in the domain of un- 
certainty principles and localization, emphasize their mutual connections 
and investigate practical consequences. The discussion is strongly ori- 
ented towards, and motivated by signal processing problems, from which 
significant advances have been made recently. Relations with sparse ap- 
proximation and coding problems are emphasized. 

1 Introduction 

Uncertainty inequalities generally express the impossibility for a function (or 
a vector in the discrete case) to be simultaneously sharply concentrated in two 
different representations, provided the latter are incoherent enough. Such a 
loose definition can be made concrete by further specifying the following main 
ingredients: 

• A global setting, generally a couple of Hilbert spaces (of functions or 
vectors) providing two representations for the objects of interest (e.g. time 
and frequency, or more general phase space variables). 

• An invertible linear transform (operator, matrix) mapping the initial 
representation to the other one, without information loss. 

• A concentration measure for the elements of the two representation 
spaces: variance, entropy, LP norms,... 

Many such settings have been proposed in the literature during the last century, 
for various purposes. The first formulation was proposed in quantum mechanics 
where the uncertainty principle is still a major concern. However it is not 
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restricted to this field and appears whenever one has to represent functions 
and vectors in different manners, to extract some specific information. This is 
basically what is done in signal processing where the uncertainty principle is of 
growing interest. 

The basic (quantum mechanical) prototype is provided by the so-called 
Robertson-Schrodinger inequality, which establishes a lower bound for the prod- 
uct of variances of any two self-adjoint operators on a generic Hilbert space. The 
most common version of the principle is as follow: 

Theorem 1. Let f e H (Hilbert space), with \\f\\ = 1. Let A and B be 
(possibly unbounded) self-adjoint operators on % with respective domains D(A) 
and D(B). Define the mean and variance of A in state f S D(A) by 

e f (A) = (Af,f) , v f (A)=e f (A 2 )-e f (A) 2 . 

Setting [A, B] = AB - BA and {A, B} = AB + BA, we have V/ € D(AB) n 
D(BA), 

v f (A)v f (B) > \ [\e f ([A, B])| 2 + \e f ({A - e f (A),B - e/ (i?)})| 2 ] . 

The quantities v/(A) and Vf(B) can also be interpreted as the variances of 
two representations of / given by its projection onto the respectives bases of 
(possibly generalized) eigenvectors of A and B. From the self-adjointness of A 
and B, there exists a unitary operator mapping one representation to the other. 

The proof of this result is quite generic and carries over many situations. 
However, the choice of the variance to measure concentration properties may be 
quite questionable in a number of practical situations, and several alternatives 
have been proposed and studied in the literature. 

The goal of this paper is to summarize a part of the literature on this topic, 
discuss a few recent results and focus on specific signal processing applications. 
We shall first describe the continuous setting, before moving to discrete formu- 
lations and emphasizing the main differences. Given the space limitations, the 
current paper cannot be exhaustive. We have selected a few examples which 
highlight the structure and some important aspects of the uncertainty principle. 
We refer for example to jT5] for a very good and complete account of classical 
uncertainty relations, focused on time-frequency uncertainty. An information 
theory point of view of the uncertainty principle may be found in [5 and a 
review of entropic uncertainty principles has been given in |29) . More recent 
contributions, mainly in the sparse approximation literature, introducing new 
localization measures will be mentioned in the core of the paper. 
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2 Some fundamental aspects of the uncertainty 
principle 

2.1 Signal representations 

The uncertainty principle is usually understood as a relation between the si- 
multaneous spreadings of a function and its Fourier transform. More generally, 
as expressed in Theorem [T] it also expresses a relation between any two rep- 
resentations, the ones given by the projection of the signal onto the (possibly 
generalized) eigenbases of A and B. Can a representation be something else 
than the projection onto a (generalized) eigenbasis? The answer is yes: repre- 
sentations can be made by introducing frames. A set of vectors U = {uk}k m a 
Hilbert space H is a frame of H if for all / € %: 

A||/|| 2 <^|(/, Ufe )| 2 <B||/|| 2 , (1) 

k 

where A, B are two constants such that < A < B < oo. Since A > 0, 
any / € H can be recovered from its frame coefficients {(/, Uk)}k- This is a 
key point: in order to compare two representations, no information must be 
lost in the process. Orthonormal bases are particular cases of frames for which 
A = B = 1 and the frame vectors are orthogonal. 

Denote by U : f € % {(f, u k )}k the so-called analysis operator. U is left 
invertible, which yields inversion formulas of the form 

/ = ^2(f,Uk)u k , 

k 

where U = {uk}k is an other family of vectors in T-L, which can also be shown 
to be a frame, termed dual frame of U. Choosing as left inverse the Moore- 
Penrose pseudo-inverse U^ 1 = W yields the so called canonical dual frame 
U° = {ill = {UU*)~ 1 u k ] k , but other choices are possible. 

The uncertainty principle can be naturally extended to frame representa- 
tions, i.e. representations of vectors / e H by their frame coefficients. As 
before, uncertainty inequalities limit the extend to which a vector can have two 
arbitrarily concentrated frame representations. Since variances are not necessar- 
ily well defined in such a case, other concentrations measures such as entropies 
have to be used. For example, bounds for the entropic uncertainty principle are 
derived in [27] . 

2.2 How different are two representations ? the mutual 
coherence 

A second main aspect of uncertainty inequalities is the heuristic remark that the 
more different the representations, the more constraining the bounds. However, 
one needs to be able to measure how different two representations are. This is 
where the notion of coherence enters. 
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Let us first stick to frames in the discrete setting. Let U = {uk}k and 
V = {vk}k be two frames of W. Let us define the operator T = VU^ 1 which 
allows one to pass from the representation of / in U to the one in V. It is given 
by: 

TU fU) = (J2( u f)( k )^,v 3 ) = (u k ,v 3 ). 

\ k Ik 

This relation shows that in finite dimension T is represented by a matrix G — 
G(U,V) (the cross Gram matrix of U and V) defined by Gj^ — (tik,Vj). The 
matrix G encodes the differences of the two frames. The latter can be measured 
by various norms of T, among which the so-called mutual coherence: 

/j, = n(U,V) = max\{uk,Vj)\ = max\G jtk \ = ||T||^i^oo . (2) 

This quantity encodes (to some extend) the algebraic properties of T . 

Remark 1. This particular quantity (norm) may be generalized to other kind 
of norms which would be, depending on the setting, more appropriate for the 
estimation of the correlation between the two representations. Indeed, it is the 
characterization of the matrix which quantify how close are two representations. 

Remark 2. In the standard case (N -dimensional) where the uncertainty is 
stated between the Kronecker and Fourier bases, |Tj,fc| = 1/yN for all j,k. 
These bases are said to be mutually unbiased and p, = 1/yN is the smallest 
possible value of fi. 

In the case of the Entropic uncertainty principle, the demonstration of the 
inequality is based on the Riesz interpolation theorem and it rely on bounds 



of T as an operator from I 1 — > £°° and from £ 2 —> ^ 2 (see section 4.3 1. As we 
shall see, this notion of mutual coherence appears in most of the uncertainty 
relations. A noticeable exception is the variance-based uncertainty principle. In 
this case it is replaced by the commutation relation between the two self-adjoint 
operators and the connection with the coherence is not straightforward. 



2.3 The notion of phase space 

Standard uncertainty principles are associated with pairs of representations: 
time localization vs frequency localization, time localization vs scale localiza- 
tion,... However, in some situations, it is possible to introduce directly a 
phase space, which involves jointly the two representation domains, in which 
(non-separable) uncertainty principles can be directly formulated: joint time- 
frequency space, joint time-scale space,... 

Uncertainty principles associated with pairs of representations often have 
counterparts defined directly in the joint space. We shall see a few examples 
in the course of the current paper. In such situations, the mutual coherence is 
replaced with a notion of phase space coherence. 
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3 Uncertainty inequalities in continuous settings: 
a few remarkable examples 



To get better insights on the uncertainty principle we state here a few remarkable 
results which illustrate the effect of changing (even slightly) the main ingredi- 
ents. This helps understanding the choices made below in discrete settings. 

The most popular and widespread form of the uncertainty principle uses the 
variance as spreading measure of a function and its Fourier transform. This 
leads to the inequality stated in Theorem [l] where A = X is the multiplication 
operator Xf(t) = tf(t) and B = P = ~idt/2ir is the derivative operator. This 
first instance of uncertainty inequalities is associated to the so-called canonical 
phase space, i.e. the time-frequency, or position-momentum space. Let us first 
introduce some notations. Given / € L 2 (E), denote by / its Fourier transform, 
defined by 

/>)= f(t)e- 2i ^dt. 

J —oo 

With this definition, the Fourier transformation is an unitary operator L 2 (K) — > 
L 2 (R). The classical uncertainty inequalities state that for any / 6 L 2 (R), f 
and / cannot be simultaneously sharply localized. 

Heisenberg's inequality, let T~L = L 2 (R) and consider the self-adjoint op- 
erators X and P, defined by Xf(t) = tf(t) and Pf(t) = -if'(t)/2ir. X and 
P satisfy the commutation relations [X, P] = il, where 1 is the identity op- 
erator. For / e L 2 (R), denote by et and Vf its expectation and variance (see 
Theorem [T]): 

I r°° If 00 

e /= e /W = ^ J_Af{t)\ 2 dt , U/=U/ (X) =— Jjt 

Then the Robertson-Schrbdinger inequality takes the form 
Corollary 1. For all f e L 2 (R), 

v fv^iL->> (4) 

with equality if and only if f : t — > f(t) = Gaussian function, up 

to time shifts, modulations, rescalings and chirping (a,b,\x G C, with ^R(b) > 0). 

3.1 Variance time-frequency uncertainty principles on dif- 
ferent spaces 

Usual variance inequalities are defined for functions on the real line, or on Eu- 
clidian spaces. It is important to stress that these inequalities do not generalize 
easily to other settings, such as periodic functions, or more general functions on 
bounded domains. First, the definition of mean and variance themselves can be 



-e f ) 2 \f{t)\ 2 dt. 

(3) 
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difficult issuea^J For example, the definition of the mean of a function on the 
circle S 1 is problematic. Sticking to the above notations, the operator X is not 
well defined on L 2 (5' 1 ) because of the periodicity, the meaning of e/(X) is not 
clear, and does definitely not represent the mean value of /. Adapted definitions 
of mean and variance are required. For example, the case % = L 2 (M), where 
M is a Riemannian manifold, has been studied by various authors (see [TT] and 
references therein). 

For example, in the case of the circle one definition of the mean value is given 
by ej — axg{f,Ef) where Eip(t) = exp(i2nt)i/j(t) (the so-called von Mises's 
mean, see [1]). From this, an angle-momentum uncertainty inequality has been 
obtained in [ST], g]. Yet, additional difficulties appear: first the bound of the 
uncertainty principle is modified (compared to the L 2 (R) case) and depends 
non trivially on the function / involved. This implies that functions whose 
uncertainty product attains the bounds are not necessarily minimizers and the 
strict positivity of the lower bound may not be garanteed. The answer of the 
authors is to suggest to modify the definition of the variance (in addition to the 
modification of the mean). 

Similar problems are encountered in various different situations, such as the 
affine uncertainty which we account for below. All this shows that alternatives 
to variance-based spreading measures are necessary. We will address these in 
section 13.31 below. 

3.2 Different representations 

In the Robertson-Schrodinger formulation, the two representation spaces under 
consideration (which form the phase space) are L 2 spaces of the spectrum of two 
self-adjoint operators A and B. The spectral theorem establishes the existence 
of two unitary maps V a and Ub mapping H to the two L 2 spaces; the images 
of elements of % by these operators yield the two representations, for which 
uncertainty inequalities can be proven. It is worth noticing that these represen- 
tations can be (possibly formally) interpreted as scalar products of elements of 
H with (possibly generalized) eigenbases of A and B. 

This allows one to go beyond the time-frequency representation and intro- 
duce generalized phase spaces. We shall assume that the generalized phase 
space is associated with self-adjoint operators Ax, . . . which are infinitesimal 
generators of generalized translations, acting on some signal (Hilbert) space H. 
Whenever two operators Aj, Ai are such that there exists a unitary transform 
U which turn these two operators into the standard case (operator X, P de- 
fined above), one can obtain time-frequency type uncertainty inequalities. In 
such cases, the lower bound is attained for specific choices of /, which are the 
images of gaussian functions by the unitary transformation U . We will refer to 
this construction as a canonization process. An example where canonization is 
possible can be found in Remark [3] below. 

The definition and properties of the variance (and other moments) on compact manifolds 
is by itself a well defined field of research named directional statistics. 
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When this is not the case, the commutator [A,, ^4;] is not a multiple of the 
identity and the lower bound generally depends on /. This implies phenomena 
described in section 3.1 Even worse, if the spectrum of the operator i[Aj, 



include zero then the lower bound is zero, revealing that the variance may not 
be a spreading measure in this case. 



3.2.1 Time-scale variance inequality. 

The classical affine variance inequality is another particular instance of the 
Robertson-Schrodinger inequality: let A = X and B = D = (XP + PX)/2 
denote the infinitesimal generators of translations and dilations, acting on the 
Hardy space H 2 (R) = {/ G L 2 (R), f{v) = Vi/ < 0}, which is the natural 
setting here. 

Explicit calculation shows that [X, D] = iX, and it is worth introducing the 
scale transform / G H 2 (R) — > /, which is a unitary mapping H 2 (R) «-» L 2 (K) 
dchncd by 

r°° dv r°° 

f(s)= f( v )e 2 ^—= f(e u )e u ' 2 e 2 " us du. (5) 

JO \ v J-oc 

The corresponding Robertson-Schrodinger inequality state 
Corollary 2. For all f G H 2 (R), 

v r v f ~ vh e2 f ' 

with equality if and only if f is a Klauder waveform, defined by 

f(v) = Kexp{aln(v) - bv + i{c\n{v) +d)} , v e R + (7) 
for some constants K G C, a > —1/2, b £ R + and c, d G R. 

It is worth noticing that the right hand side explicitely depends on /, so 
that the Klauder waveform, which saturates this inequality, is not necessarily a 
minimizcr of the product of variances, as analyzed in |24) . 



(6) 



3.2.2 Modified time-scale inequality. 

This remark prompted several authors (see [13] for a review) to seek different 
forms of averaging, adapted to the affine geometry. This led to the introduction 
of adapted means and variances: for / G H 2 (M.), set 

{if 00 , 1 1 r°° r i 2 ~ 

-j- \f{v)\ 2 Hv)dA , v f =-j-j o [HWe f )\ l/MI 2 * 

(8) 

In this new setting, one obtains a more familiar inequality 
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Figure 1: Examples of Klauder waveform (left) and Altes waveform (right). 



Proposition 1. For all f e if 2 (I 



Vf.Vf > s 



(9) 



with equality if and only if f takes the form of an Altes waveform, defined by 



1 



f(v) = ivTexp 
which is now a variance minimizer. 



- ln(i/) - aln 2 (^/6) + i(chx(u) + d) 



v G 



(10) 



Remark 3 (Canonization). The connection between Klauder 's construction and 
Altes' can also be interpreted in terms of canonization. Let U : H 2 (M.) —> L 2 (R) 
denote the unitary linear operator defined by Uf(y) = e v l 2 f(e v ), for v £ R + . 
The adjoint operator reads U* f(s) = /(ln(s))/^/s (for s € K+ ), and it is readily 
verified that U is unitary. Consider now the linear operators X and P on 
H 2 (M.) defined by X — U*XU and P — U*PU. Simple calculations show that 
X = D/2tt and P — 2ir ln(P/27r), these two operators being well defined on 
H 2 (R) . Hence X and P satisfy the canonical commutation relations on H 2 (R) : 

[D, ln(P)] = [D, ln(P/27r)] = [X, P] = U* [X, P]U = i 1 2 H (R) . 

Now, given any self adjoint operator A on iJ 2 (R), and for any f G H 2 (M.), set 
g = Uf, and one has &f(A) — e g (UAU*) and Vf(A) — v g (UAU*). Therefore, 



v f (D).v f (HP))=v g (X).v g (P) 



> 



1 

lGn 2 



with equality if and only if g is a Gaussian function, i.e. f is an Altes wavelet. 

3.3 Different dispersion measures 

As stressed above, variance is not always well denned, and even when it is so, 
variance inequalities may not yield meaningful informations. Alternatives have 
been proposed in the literature, and we review some of them here. Some of then 
show better stability to generalizations, and will be more easily transposed to 
the discrete case. 
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3.3.1 Hirschman-Beckner entropic inequality 

Following a conjecture by Everett [12] and Hirschman [18] , Beckner [2. proved 
an inequality involving entropies. Assume ||/|| = 1, and define Shannon's dif- 
ferential entropy by 

H(f) = J \f(t)\ 2 \n(\f(t)\ 2 )dt. (11) 

Then the Hirschman-Beckner uncertainty principle states 
Theorem 2. For all f e L 2 (R), 

H(f) + H{f) > 1 - ln(2) , (12) 

with equality if and only if f is a Gaussian function ( up to the usual modifica- 
tions). 

The proof originates from the Bahenko -Beckner inequality (also called sharp 
Hausdorff- Young inequality) [2]: for / g L P (E), let 1/p + 1/p' = 1; then 
II /Hp' < A p \\f\\ p , where A p = y/p x /p /p n /p' . Taking logarithms after suitable 
normalization yields an inequality involving Renyi entropies (see below for a def- 
inition), that reduces to the Hirschman-Beckner inequality for p = p' = 2. As 
remarked in [14] . for the time-scale uncertainty the canonization trick applies 
in this case as well, and yields a corresponding entropic uncertainty inequality 
for time and scale variables. 



3.3.2 Concentration on subsets, the Donoho-Stark inequalities 

In [9], Donoho and Stark prove a series of uncertainty inequalities, in both 
continuous and discrete settings, using different concentration measures. One 
of these is the following: for / e L 2 (M.), and e > 0, / is said to be e-concentrated 
in the measurable set U if there exists g supported in U such that ||/ — g\\ < e. 
Donoho and Stark prove 

Theorem 3. Assume that f is ex concentrated in T and f is ep concentrated 
in F ; then 

\T\.\F\>(l-(e T + e F )f . (13) 

Remark 4 (Gerchberg-Papoulis algorithm). This uncertainty inequality is used 
to prove the convergence of the Gerchberg-Papoulis algorithm for missing sam- 
ples restoration for band-limited signals, as follows. Let F, T be bounded mea- 
surable subsets of the real line. Given x G L 2 (M.) such that Supp(£) C F , assume 
observations of the form 

f x(t)+n(t) ift?T 
1 n(t) otherwise 

where n is some noise, simply assumed to be bounded. 



9 



Denote by Pr the orthogonal projection onto L 2 signals supported by T in the 
time domain, and by Pp the corresponding projection in the frequency domain. 
If \F\.\T\ < 1, then \\PtPf\\ < 1 and x is stably recovered by solving 

x = (l- PtPf)-^ , 

where stability means \\x — x\\ < C||n||. 

The same paper by Donoho and Stark provides several other versions of the 
uncertainty principle, in view of different applications. 

In a similar spirit, Benedicks theorem states that every pair of sets of finite 
measure (T, F) is strongly annihilating, i.e. there exists a constant C(T, F) such 
that for all / e L 2 (R), 

H/lli W ) + U/lli W ) > \\f\\ 2 /C(T,F) . (14) 

We refer to [30] for more details, together with generalizations to higher dimen- 
sions as well as explicit estimates for the constants C(T, F). 

3.4 Non-separable dispersion measures 

Traditional uncertainty principles bound joint concentration in two different rep- 
resentation spaces. In some situations, it is possible to define a joint representa- 
tion space (phase space) and derive corresponding uncertainty principles. This 
is in particular the case for time-frequency uncertainty. The quantities of inter- 
est are then functions defined directly on the time-frequency plane, such as the 
short time Fourier transform and the ambiguity function. Given f,g G L 2 (M), 
the STFT (Short time Fourier transform) of / with window g and the ambiguity 
function of / are respectively the functions V g f,Af £ L 2 (M. 2 ) defined by 

/oo 
f{t)g(t-b)e- 2 ™ vt dt , A f =V f f. (15) 
-oo 

Concentration properties of such functions have been shown to be relevant in 
various contexts, including radar theory (see [23 ) or time-frequency operator 
approximation theory [B]. We highlight a few relevant criteria and results. 

3.4.1 L p -norm of the ambiguity function: Lieb's inequality 

E. Lieb (see [3] for example) gives bounds on the concentration of the Ambiguity 
function (resp. STFT). Contrary to Heisenberg type uncertainty inequalities, 
which privilege a coordinate system in the phase space (i.e. choose a time and 
a frequency axis), bounds on the ambiguity function don't. Here, concentration 
is measured by L p norms, and the bounds are as follows 

Theorem 4. For all f,g e L 2 (R), 

\\A f \\ p > Bpll/lll forp<2 ( \\V g f\\ p >B p \\g\\ 2 \\f\\ 2 for p < 2 

\\A f \\ p < B p \\f\\ 2 forp>2 , { \\V g f\\ p <B p \\g\\ 2 \\f\\ 2 forp>2 

\\A f \\ 2 = ii/Hi [ ||v s /|| 2 = y| 2 ||/|| 2 

(16) 
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where B p = (2/p) 1 ^ p is related to the Beckner-Babenko constants. 

The norm || • || p can be regarded as a diversity, or spreading measure for p < 2 



and as a sparsity, or concentration measure for p > 2 (see section 4.2). Again, 
the optimum is attained for Gaussian functions. It is worth noticing that as 
opposed to the measures on subsets, these concentration estimates are strongly 
influenced by the tail of the Gabor transform or the ambiguity function. It is 
not clear at all that the latter is actually relevant in practical applications. 



3.4.2 Time-frequency concentration on compact sets: 

As a consequence of Lieb's inequalities, one can show (see [17] for a detailed 
account) the following concentration properties for ambiguity functions and 
STFTs: let SlcR 2 , measurable, and e > be such that 

\V g f{b,v)\ 2 dbdv> (l-e)|| 5 || 2 ||/|| 2 , (17) 

o 

thenVp > 2, > (1 - ef / ^- 2 \p / 2) 2 / ^- 2 \ In particular, forp = 4, this yields 

|ft|>2(l-e) 2 . (18) 

Remark 5. It would actually be worth investigating possible corollaries of such 
estimates, in the sense of Gerchberg-Papoulis. For instance, assume a measur- 
able region f2 of a STFT has been discarded, under which assumptions can one 
expect to be able to reconstruct stably the region ? Also, when T is large, one can 
probably not expect much stability for the reconstruction, however what would be 

reasonable regularizations for solving such a time-frequency inpainting problem 

9 



3.4.3 Peakyness of ambiguity function 

Concentration properties of the ambiguity function actually play a central role 
in radar detection theory (see e.g. |30j). However, the key desired property 
of ambiguity functions, namely peakyness, otherwise stated the existence of a 
sharp peak at the origin, is hardly accounted for by L p norms, entropies or 
concentration on compact sets as discussed above. 

Ambiguity function peakyness optimization can be formulated in a discrete 
setting as follows. Suppose one is given a sampling lattice A = b§L x v^L in the 
time-frequency domain, peakyness of A g can be optimized by maximizing (with 
respect to g) the quantity 

fi(g) = max A g (mbo, nv^) . (19) 

(m,Ti)^(0,0) 

Two examples of waveforms with different concentration properties are given in 
Figure § The gaussian function (left) has well known concentration properties, 
while the ambiguity function of a high order hermite function (right) is much 



11 



Ambiguity function 



Ambiguity function 





Figure 2: 3D plots of the ambiguity functions of a standard Gaussian (left) and 
a Hermite function of hight order (right). 



more peaky, even though the function itself is poorly time localized and poorly 
frequency localized. 

The quantity in ( 19 ) is actually closely connected (as remarked in [55]) to the 



so-called coherence, or self- coherence of the Gabor family V = {g m ,n, tn, n £ Z} 



generated by time- frequency shifts g mn (t) 
A (see [II] for a detailed account), as 



g(t—mbo) of g on the lattice 



[1 = max \(g m n,9m'n')\ ■ 

Hence, optimizing the peakyness of the ambiguity function is closely connected 
to minimizing the coherence of the corresponding Gabor family, a property 
which has been often advocated in the sparse coding literature. 

Remark 6. As mentioned earlier, sparsity requirements lead to minimize the 
joint coherence in the case of separable uncertainty principles, and the self co- 
herence in the case of non-separable uncertainty principles. 



4 Discrete inequalities 
4.1 Introduction 

The uncertainty principle in the discrete setting has gained increasing interest 
during the last years due to its connection with sparse analysis and compressive 
sensing. Sparsity has been shown to be an instrumental concept in various ap- 
plications, such as signal compression (obviously), signal denoising, blind signal 
separation,... We first review here the main sparsity/diversity measures that 
have been used in the signal processing literature, show that they are closely 
connected and present several versions of the uncertainty principle. Then we 
present a few examples of their adaptation to phase space concentration prob- 
lems. 

In the discrete finite-dimensional setting, we shall use the Hilbert space T~L = 
C L as a model signal space. In terms of signal representations, we consider finite 
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frames U = {u\ 6 H, A G A} (see Section 2.1 for motivations and definitions) 
in H, and denote by U : x G H — » {(.t, ua), A G A} the corresponding analysis 
operator, and by its adjoint U* the synthesis operator. 

The time-frequency frames offer a convenient and well established framework 
for developing ideas and concepts, and most of the approaches described below 
have been developed using Gabor frames. For the sake of completeness, we give 
here the basic notations that will be used in the sequel. Given a reference vector 
iJj G % (called the mother waveform, or the window), a corresponding Gabor 
system associates with ip a family of time-frequency translates 

VwW = e 2l7Tmuot 4>(t - nb ) , m G Z M , neZ N , t G Z L , 

where 60 = ci/L and vq = b/L (with a,b integers that divide L) are constants 
that define a time-frequency lattice A. The corresponding transform V$ asso- 
ciates with any x G % a function (to, n) G A — > V^x(m,n) = (x,ip mn ). When 
a = b = 1, the corresponding transform is called the Short-time Fourier trans- 
form (STFT). 

The ambiguity function of the window ip is the function A^, defined as the 
STFT of the waveform ip using ip as mother waveform, in other words — 



4.2 Sparsity measures 

As mentioned earlier, the variance as a measure of spreading is problematic in 
the finite setting both with its definition and the inequalities it yields. More 
adapted measures have been proposed in the literature, among which the cele- 
brated ^ 1 -norm used in optimization problems, entropy used by physicists and in 
information theory and support measures favored for sparsity related problems. 



4.2.1 f p -norms and support measure 

Given a finite-dimensional vector x G C L , it is customary in signal processing 
applications to use £ p (quasi-) norms of x to measure the sparsity (p > 2) or 
diversity (p < 2) of the vector x: 

L 

En p - (2°) 

These quantities (except for p — 0) do not fully qualify as sparsity or diversity 
measures since they depend on the £ 2 -norm of x. To circumvent this problem, 
normalized ^ p -norms are also considered: 

11*11 

1p( x ) = Ti iT~ = IM|p , with x = x/\\x\\ 2 ■ (21) 

11*112 

The normalized quantity \x\ 2 may be seen as a probability distribution function. 

The special case p — gives the support measure (number of non-zero coef- 
ficients) also denoted £°. This is not a norm but is obviously a sparsity measure. 
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4.2.2 Renyi entropies 

Entropy is a notion of disorder or spreading for physicists and a well-established 
notion for estimating the amount of information in information theory. Given 
a G K + and a vector x £ C L , the corresponding Renyi entropy [55] R a (x) is 
defined as 

2a 

R a (x) = In ( 72q (x)) , oc £ 1 . (22) 

1 — a 

Renyi entropies provide diversity measures, i.e. sparsity is obtained by mini- 
mizing the entropies. The limit a — > 1 is not singular, and yields the Shannon 
entropy 

S(i,= -£hHhi) ■ 

These notions have been proven useful for measuring energy concentration in 
signal processing, especially in the time-frequency framework [T] and |19j . 



4.2.3 Relations between sparsity measures 



Equation (22 1 shows that minimizing the £ p -norm with p < 2 is equivalent to 
minimizing the Renyi entropy for a = p/2. Note also that for p — 2a > 2, 
1/(1 — a) is negative and minimizing the a-entropy leads to the same results as 
for maximizing the £ p -norm. The limit a — > 1 gives the Shannon entropy. Note 
also that the limit a = is not singular and gives the logarithm of the support 



size. Hence, all these measures a related through Eq. (22) and belong to the 
same family. 

So far, the focus has been put on the Renyi entropies and their limit, the 
Shannon entropy; Tsallis entropies T a (x) — — ( , y%a( x ) ~ l)/( a — 1)> initially 
introduced in statistical physics, may be seen as some first order approximations 
of Renyi entropies and can also be used along the same lines. Comparison 
between these measures could be an interesting issue. 



4.3 Sparsity related uncertainties in finite dimensional set- 
tings 

Discrete uncertainty inequalities have received significant attention in many 
domains of mathematics, physics and engineering. We focus here on the aspects 
that have been mostly used in signal processing. 



4.3.1 Support uncertainty principles 

The core idea is that in finite dimensional settings, two orthonormal bases pro- 
vide two different representations of the same object, and that the same object 
cannot be represented sparsely in two "very different bases" . In the original 
work by Donoho and Huo [5], the finite-dimensional Kronecker and Fourier 
bases were used, and Elad and Bruckstein [TU] extended the result to arbitrary 
orthonormal bases. The £° quasi-norm is used to measure diversity. 
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Theorem 5. Let $ = {(p n) n € Zjy} a«d VP = {^Vi> n € Zjv} denote two 
orthonormal bases of C . For all x 6 C w , denote by a £ C N and f3 6 C w £/ie 
coefficients of the expansion of x on $ and IP respectively. Then if x 7^ 

IMIo • ||0llo > A and Ho + ||£||o > ^ » ( 24 ) 

where fi = /i($, , 5) is £/ie mutual coherence 0/ $ and ^ ('see equation 

Remark 7. The Welch bound states that the mutual coherence of the union 
of two orthonormal bases of C N cannot be smaller than 1/y/N; the bound is 
sharp, equality being attained in the case of the Kronecker and Fourier bases. 

Remark 8. The result was extended later on by Donoho and Elad [7] to arbi- 
trary frames, using the notion of Kruskal's rank (or spark): the Kruskal rank 
of a family of vectors T> = {ipo, . . . <pn-i} m a finite-dimensional space is the 
smallest number rx such that there exists a family of tk linearly dependent 
vectors. Assume that x <G C , x 7^ has two different representations in T>: 

N-l N-l 

if x = ^2 a n (p n = ^2 f3 n (p n , then ||a|| + \\/3\\q > r K . 

n=Q n=0 

Bounds describing the relationship between the Kruskal rank and coherence 
have also been given in [7J. 

Let us also mention at this point the discrete versions of the concentration 
inequality ( 14 ), obtained in [IB]. Given two bases in C N , let T, F be two subsets 



of the two index sets {0, 1 • • • , N— 1} and assume \T\ ■ \F\ < l/^i 2 . Then for all 

x, 

M\mz N \T) + \\P\\ev„\F) > (1 + - _ ^J^jyj ^j ll^l' 2 • ( 25 ) 

Remark 9. As expected, all these inequalities imply a strictly positive lower 
bound and the coherence fi. 

In a recent study |27j the support inequalities have been extended from basis 
representations to frame ones. More precisely, for any vector x € T-L, bounds of 
the following form have been obtained 

Theorem 6. Let U — {U^\ . . . bt^} denote a set of K frames in a Hilbert 
space % . Then for any x £ % 

n 

£ 11^11° >^r> (26) 

where /1* is a generalized coherence, defined as follows: 



//* = inf inf \/ Hr(UW ■ ■ ■ M^"- 1 ),^™))^^™) M (1) ) , (27) 
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where the infimum over IA is taken over the family of all possible dual frames 
IA = {lA^, . . . lA^ n '} of the elements ofU, and the r-coherences fj, r are defined 
as 

C\ r/r' 

Therefore, the control parameter here is the generalized coherence fi+. If 
the canonical dual frame is chosen, \x r is often smaller than fi which shows 
an improvement. This suggests new definitions for the coherence which may 
improve further the inequality bound. 



4.3.2 Entropic uncertainty 

In section |4.2| we introduced the entropy as a measure of concentration and we 
also stated earlier an entropic version of the uncertainty principle for the contin- 
uous case (Hirschman-Beckner) . It turns out that the latter can be extended to 
more general situations than simply time-frequency uncertainty. For example, 
in a discrete setting, given two orthonormal bases it was proven by Maassen 
and Uffink [25] and Dembo, Cover and Thomas [5J independently that for any 
x, the coefficient sequences a, (3 of the two corresponding representations of x 
satisfy 

S(a) + S(f3) > -21nM , (29) 

with fj, the mutual coherence of the two bases. In the particular case of Fourier- 
Kronecker bases, // = 1/VN, which leads to the similar result given in Prop, [i] 
for ambiguity function; the picket fences are the minimizers (see next section). 

These results were generalized recently in [27j , where entropic inequality for 
frame analysis coefficients were obtained. 

Theorem 7. Let % be a separable Hilbert space, let IA and V be two frames of 
H, with bounds Ay, By and Ay, By. Let IA and V denote corresponding dual 
frames, and set 

Let re [1,2). For all a G [r/2, 1], let /? = a(r - 2)/(r - 2a) e [1, oo]. For 
x S H, denote by a and b the sequences of analysis coefficient of x with respect 
to IA and V. Then 

1. The Renyi entropies satisfy the following bound: 

(2 - r)R a (a) + rR p (b) > -2 \n{ Vr (lA,U, V)) - ln(a(U, V)) (31) 

2. IflA and V are tight frames, the bound becomes 

{2-r)R a {a)+rRf}{b)>-2\n{v r {UM,V)) . (32) 



P {IA,V) 
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Figure 3: Picket fence (left) vs periodized gaussian (right) 



3. In this case, the following inequalities between Shannon entropies hold 
true: 

5(a) + 5(6) > -2\n(^{U,U,V,V)) , (33) 



where /i* is defined in (27) 



The proof is both a refinement and a frame generalization of the proof in [351 
[5] . A main result of [27] is the fact that these (significatly more complex) bounds 
indeed provide stronger estimates than the Maassen-Uffink inequalities, even in 
the case of orthonormal bases. They are however presumably sub-optimal for 
non tight frames, as they yield in some specific limit support inequalities that 
turn out to be weaker than the ones presented above. 



4.3.3 Phase space uncertainty and localization 

Again, as in the continuous case, uncertainty inequalities defined directly in 
phase space can be proven. For example, in the joint time-frequency case, 
finite-dimensional analogues of Lieb's inequalities have been proven in |13j . 

Proposition 2. Let if) £ C N be such that \\1pW2 = 1- Then, assuming p < 2, 

||^||p>JV*"' , and SiAp) > log(JV) . (34) 

The inequality is an equality for the family of "picket fence" signals, translated 
and modulated copies of the following periodic series of Kronecker deltas: 

1 b 

u(t) = —S^Sit- an), ab = N. 
V6 . 

v n— 1 

Hence, the result is now completely different from the result obtained in 
the continuous case: The optimum is not the Gaussian function (which by the 
way is not well defined in finite-dimensional situations) any more, and is now a 
completely different object, as examplified in Fig. [3j where a picket fence and a 
periodized Gaussian window are displayed. This is mainly due to the choice of 
underlying model signal spaces (generally L 2 (M)), which impose some decay at 
infinity. 

Remark 10. It is worth noticing that the above diversity measures (norms or 
entropy of the ambiguity function) are non- convex functionals of the window 
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sequence. For example, if N is a prime number, there are (up to normalization) 
2N window vectors (picket fences) whose ambiguity function is optimally con- 
centrated (in terms of entropy). When N is not prime, the degeneracy is even 
higher. 

4.4 Two signal processing applications 

The uncertainty principle and its consequences have long been considered as 
constraints and barriers to access precise knowledge and measurement. The 
innovative idea behind compressive sensing where the uncertainty principle is 
turned into an advantage for retrieving information promise many exciting de- 
velopments. In this section we present two prototype applications which involve 
the uncertainty principle. In the same spirit as compressive sensing, the first one 
shows how the uncertainty principle can be used for the separation of signals. 
The second application is a more classical one which provides time-frequency 
windows with minimum uncertainty under some additional constraints. 

4.4.1 Sparsity-based signal separation problem 

The signal separation problem is an extremely ill-defined signal processing prob- 
lem, which is also important in many engineering problems. In a nutshell, it 
consists in splitting a signal x into a sum of components Xk, or parts, of different 
nature: 

x = x\ + x 2 + • • • + x n . 

While this notion of different nature often makes sense in applied domains, it is 
generally extremely difficult to formalize mathematically. Sparsity (see [55] for 
an introduction in the data separation context) offers a convenient framework 
for approaching such a notion, according to the following paradigm: 
Signals of different nature are sparsely represented in different waveform sys- 
tems. 

Given a union of several frames (or frames of subspaces) U^ x \lA^ 2 \ . . .U'"' 
in a reference Hilbert space %, the separation problem can be given various 
formulations, among which the so-called analysis and synthesis formulations. 

• In the synthesis formulation, each component Xk will be synthesized using 
the fc-th frame in the form J^ . a^u^ 1 , and the synthesis coefficients a 
will be sparsity constrained. The problem is then settled as 

n n 

min \^ \\ofl°) 1 1 o , under constraint x = Oj Uj . 

fe=i fe=i 

• In analysis formulations the splitting of x is sought directly as the solution 
of 

n 

mm SEfello , under constraint x — X\ + x 2 + ■ ■ ■ x n , 

xi,...x n £W. ^— ' 
fc=l 
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where U^ k ' denotes the analysis operator of frame k. 

In the case of two frames, it may be proven that if one is given a splitting 
x = xi + x 2 , obtained via any algorithm, if HC/^xiHo + \\U^ x 2 1| o is small 
enough, then this splitting is necessarily optimal. More precisely 

Corollary 3. Let and U^ 1 denote two frames in FL. For any x G FL, let 

x = X\ + X2 denote a splitting such that 

||t/ (1) a;i||o + ||C/ (2) ^||o < — • 

Then this splitting minimizes xx\\o + X2\\o- 

Hence, the performances of the analysis-based signal separation problems 
rely heavily on the value of this generalized coherence function. 

The extension to splittings involving more than two parts is more cumber- 
some. It can be attacked recursively, but this involves combinatorial problems 
which are likely to be difficult to solve. 

4.4.2 Sparsity-based algorithms for window optimization in time- 
frequency analysis. 

Proposition [2] shows that the finite dimensional waveforms that optimize stan- 
dard sparsity measures in the ambiguity domain are not localized, neither in 
time nor in frequency. This was also confirmed by numerical experiments re- 
ported in [13] . where numerical schemes for ambiguity function optimization 
were proposed. This approach has so far been developed mainly with time- 
frequency representations, but is generic enough to be adapted to various situ- 
ations. 

More precisely, the problem addressed by these algorithms is the following: 
solve 

V> op t = arg max (\A$(z)\,z) \A^(z)\ 2 , (35) 

for some density function F : R + x A — > E + , chosen so as to enforce some specific 
localization or sparsity properties. A simple approach, based upon quadratic 
approximations of the target functional, reduces the problem to iterative diag- 
onalizations of Gabor multipliers. 

Two specific situations were considered and analyzed, namely: 

• the optimization of the ambiguity function sparsity through the maxi- 
mization of some £ p norm (with p > 2), which naturally leads to choose 
F(\Ajf,(z)\, z) = \Ati,(z)\ p ~ 2 . The functional to optimize is non-convex, 
and the outcome of the algorithm depends on the initialization. In agree- 
ment with the theory, numerical experiments can converge to picket fence 
signals (Dirac combs) as limit windows. In addition, for some choices of 
the initial input window, a Gaussian-like function (the Gaussian is the 
sparsest window in the continuous case) may also be obtained (local min- 
imum) . 
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• the optimization of the concentration within specific regions, through 
choices such as F(\A^(z)\, z) = Fo(z), for some non-negative function 
F satisfying symmetry constraints, due to the particular properties of 
the ambiguity function (.4(0, 0) = 1, A(z) = 4(— z)). The algorithm were 
shown to converge to optimal windows matching the shape of F in the 
ambiguity plane. That is to say this window is sharply concentrated and 
satisfy the shape constraint provided by F. However, the convergence 
is not guaranteed for all F and convergence issues should be treated in 
more details in future works. The algorithm has been shown to converge 
for simple shapes such as discs, ellipses or rectangles in the ambiguity 
plane. Numerical illustrations can be found in Fig. [1] (disc shape and 
rectangular /diamond shape). Since the Ambiguity plane is discrete, the 
masks are polygons rather that perfect circle and diamonds, and this im- 
plies the amazing shape of the ambiguity function, with interferences. For 
some more complex shapes (such as stars for examples), the algorithm 
was found not to converge; convergence problems are important issues, 
currently under study. 

Such approaches are actually fairly generic, and there is hope that they can 
be generalized so as to be able to generate waveforms that are optimal with 
respect to large classes of criteria. 




Figure 4: Logarithm of modulus of optimal ambiguity functions with mask 
F(|,4^(z)|, z) = F(0, z). Left: Optimal function obtained for F the indicator of 
a disk.. Right: Optimal function obtained for F the indicator of a diamond. 



5 Conclusions 

We have reviewed in this paper a number of instances of uncertainty inequalities, 
in both continuous and discrete situations. Through these particular examples 
we have focused on specific properties and connections between these different 
instances. Indeed, from its first statement in quantum mechanics to its newest 
developments in signal processing, the uncertainty principle has encountered 
many parallel evolutions and generalizations in different domains. It was not 
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a smooth and straightforward progress, as different situations call for adapted 
spreading measures, yield different inequalities, bounds and different minimizers 
(if any), and involve different proof techniques. A main point we have tried to 
make in this paper is that several classical approaches, developed in the con- 
tinuous setting, do not go through in more general situations, such as discrete 
settings. For example, the very notions of mean and variance do not necessarily 
make sense in general. In such situations other, more generic, spreading mea- 
sures such as the (Renyi) entropies and £ p -norms can be used. We attempted 
in this paper to point out the close connection between these quantities and 
suggest other candidates for further research. 

Signal representations were first understood as the function itself and its 
Fourier transform. It was then generalized to any projection on orthonormal 
bases and now any set of frame coefficients. These latter representations play 
an important role in signal processing and bring some new insight on the uncer- 
tainty bounds. The introduction of the mutual coherence measuring how close 
two representations can be, as well as the phase space coherence that measures 
the redundancy of a corresponding waveform system, lead to new corresponding 
bounds. A careful choice for this quantity is needed for obtaining the sharpest 
bound possible. We showed how this notion of coherence can be extended and 
generalized, using £ p -norms with p ^ oo. 

Concerning the uncertainty optimizers, i.e. waveforms that optimize an 
uncertainty inequality, they are of very different nature in the discrete and 
continuous cases. In a few words, in the continuous situations, some underlying 
choice of functional space implies localization as a consequence of concentration 
(as measured by the chosen spreading criterion). This is no longer the case in 
the discrete world where localization and concentration have different meanings. 

Therefore, the transition from continuous to discrete spaces is far more com- 
plex than simply replacing integrals by sums and a more thorough analysis of 
the connections between them is clearly needed. 
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